The search functionality is under construction.

Keyword Search Result

[Keyword] machine learning(172hit)

141-160hit(172hit)

  • Machine Learning in Computer-Aided Diagnosis of the Thorax and Colon in CT: A Survey Open Access

    Kenji SUZUKI  

     
    INVITED SURVEY PAPER

      Vol:
    E96-D No:4
      Page(s):
    772-783

    Computer-aided detection (CADe) and diagnosis (CAD) has been a rapidly growing, active area of research in medical imaging. Machine leaning (ML) plays an essential role in CAD, because objects such as lesions and organs may not be represented accurately by a simple equation; thus, medical pattern recognition essentially require “learning from examples.” One of the most popular uses of ML is the classification of objects such as lesion candidates into certain classes (e.g., abnormal or normal, and lesions or non-lesions) based on input features (e.g., contrast and area) obtained from segmented lesion candidates. The task of ML is to determine “optimal” boundaries for separating classes in the multi-dimensional feature space which is formed by the input features. ML algorithms for classification include linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), multilayer perceptrons, and support vector machines (SVM). Recently, pixel/voxel-based ML (PML) emerged in medical image processing/analysis, which uses pixel/voxel values in images directly, instead of features calculated from segmented lesions, as input information; thus, feature calculation or segmentation is not required. In this paper, ML techniques used in CAD schemes for detection and diagnosis of lung nodules in thoracic CT and for detection of polyps in CT colonography (CTC) are surveyed and reviewed.

  • Bayesian Estimation of Multi-Trap RTN Parameters Using Markov Chain Monte Carlo Method

    Hiromitsu AWANO  Hiroshi TSUTSUI  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E95-A No:12
      Page(s):
    2272-2283

    Random telegraph noise (RTN) is a phenomenon that is considered to limit the reliability and performance of circuits using advanced devices. The time constants of carrier capture and emission and the associated change in the threshold voltage are important parameters commonly included in various models, but their extraction from time-domain observations has been a difficult task. In this study, we propose a statistical method for simultaneously estimating interrelated parameters: the time constants and magnitude of the threshold voltage shift. Our method is based on a graphical network representation, and the parameters are estimated using the Markov chain Monte Carlo method. Experimental application of the proposed method to synthetic and measured time-domain RTN signals was successful. The proposed method can handle interrelated parameters of multiple traps and thereby contributes to the construction of more accurate RTN models.

  • Dimensionality Reduction by Locally Linear Discriminant Analysis for Handwritten Chinese Character Recognition

    Xue GAO  Jinzhi GUO  Lianwen JIN  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E95-D No:10
      Page(s):
    2533-2543

    Linear Discriminant Analysis (LDA) is one of the most popular dimensionality reduction techniques in existing handwritten Chinese character (HCC) recognition systems. However, when used for unconstrained handwritten Chinese character recognition, the traditional LDA algorithm is prone to two problems, namely, the class separation problem and multimodal sample distributions. To deal with these problems,we propose a new locally linear discriminant analysis (LLDA) method for handwritten Chinese character recognition.Our algorithm operates as follows. (1) Using the clustering algorithm, find clusters for the samples of each class. (2) Find the nearest neighboring clusters from the remaining classes for each cluster of one class. Then, use the corresponding cluster means to compute the between-class scatter matrix in LDA while keeping the within-class scatter matrix unchanged. (3) Finally, apply feature vector normalization to further improve the class separation problem. A series of experiments on both the HCL2000 and CASIA Chinese character handwriting databases show that our method can effectively improve recognition performance, with a reduction in error rate of 28.7% (HCL2000) and 16.7% (CASIA) compared with the traditional LDA method.Our algorithm also outperforms DLA (Discriminative Locality Alignment,one of the representative manifold learning-based dimensionality reduction algorithms proposed recently). Large-set handwritten Chinese character recognition experiments also verified the effectiveness of our proposed approach.

  • Asymmetric Learning Based on Kernel Partial Least Squares for Software Defect Prediction

    Guangchun LUO  Ying MA  Ke QIN  

     
    LETTER-Software Engineering

      Vol:
    E95-D No:7
      Page(s):
    2006-2008

    An asymmetric classifier based on kernel partial least squares is proposed for software defect prediction. This method improves the prediction performance on imbalanced data sets. The experimental results validate its effectiveness.

  • Active Learning for Software Defect Prediction

    Guangchun LUO  Ying MA  Ke QIN  

     
    LETTER-Software Engineering

      Vol:
    E95-D No:6
      Page(s):
    1680-1683

    An active learning method, called Two-stage Active learning algorithm (TAL), is developed for software defect prediction. Combining the clustering and support vector machine techniques, this method improves the performance of the predictor with less labeling effort. Experiments validate its effectiveness.

  • Autonomous Throughput Improvement Scheme Using Machine Learning Algorithms for Heterogeneous Wireless Networks Aggregation

    Yohsuke KON  Kazuki HASHIGUCHI  Masato ITO  Mikio HASEGAWA  Kentaro ISHIZU  Homare MURAKAMI  Hiroshi HARADA  

     
    PAPER

      Vol:
    E95-B No:4
      Page(s):
    1143-1151

    It is important to optimize aggregation schemes for heterogeneous wireless networks for maximizing communication throughput utilizing any available radio access networks. In the heterogeneous networks, differences of the quality of service (QoS), such as throughput, delay and packet loss rate, of the networks makes difficult to maximize the aggregation throughput. In this paper, we firstly analyze influences of such differences in QoS to the aggregation throughput, and show that it is possible to improve the throughput by adjusting the parameters of an aggregation system. Since manual parameter optimization is difficult and takes much time, we propose an autonomous parameter tuning scheme using a machine learning algorithm for the heterogeneous wireless network aggregation. We implement the proposed scheme on a heterogeneous cognitive radio network system. The results on our experimental network with network emulators show that the proposed scheme can improve the aggregation throughput better than the conventional schemes. We also evaluate the performance using public wireless network services, such as HSDPA, WiMAX and W-CDMA, and verify that the proposed scheme can improve the aggregation throughput by iterating the learning cycle even for the public wireless networks. Our experimental results show that the proposed scheme achieves twice better aggregation throughput than the conventional schemes.

  • Kernel Based Asymmetric Learning for Software Defect Prediction

    Ying MA  Guangchun LUO  Hao CHEN  

     
    LETTER-Software Engineering

      Vol:
    E95-D No:1
      Page(s):
    267-270

    A kernel based asymmetric learning method is developed for software defect prediction. This method improves the performance of the predictor on class imbalanced data, since it is based on kernel principal component analysis. An experiment validates its effectiveness.

  • Adaptive Online Prediction Using Weighted Windows

    Shin-ichi YOSHIDA  Kohei HATANO  Eiji TAKIMOTO  Masayuki TAKEDA  

     
    PAPER

      Vol:
    E94-D No:10
      Page(s):
    1917-1923

    We propose online prediction algorithms for data streams whose characteristics might change over time. Our algorithms are applications of online learning with experts. In particular, our algorithms combine base predictors over sliding windows with different length as experts. As a result, our algorithms are guaranteed to be competitive with the base predictor with the best fixed-length sliding window in hindsight.

  • Voting-Based Ensemble Classifiers to Detect Hedges and Their Scopes in Biomedical Texts

    Huiwei ZHOU  Xiaoyan LI  Degen HUANG  Yuansheng YANG  Fuji REN  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:10
      Page(s):
    1989-1997

    Previous studies of pattern recognition have shown that classifiers ensemble approaches can lead to better recognition results. In this paper, we apply the voting technique for the CoNLL-2010 shared task on detecting hedge cues and their scope in biomedical texts. Six machine learning-based systems are combined through three different voting schemes. We demonstrate the effectiveness of classifiers ensemble approaches and compare the performance of three different voting schemes for hedge cue and their scope detection. Experiments on the CoNLL-2010 evaluation data show that our best system achieves an F-score of 87.49% on hedge detection task and 60.87% on scope finding task respectively, which are significantly better than those of the previous systems.

  • Integration of Multiple Bilingually-Trained Segmentation Schemes into Statistical Machine Translation

    Michael PAUL  Andrew FINCH  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E94-D No:3
      Page(s):
    690-697

    This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair in which the source language is unsegmented and the target language segmentation is known. In the first step, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the proposed method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available monolingually built segmentation tools.

  • A Comparative Study of Unsupervised Anomaly Detection Techniques Using Honeypot Data

    Jungsuk SONG  Hiroki TAKAKURA  Yasuo OKABE  Daisuke INOUE  Masashi ETO  Koji NAKAO  

     
    PAPER-Information Network

      Vol:
    E93-D No:9
      Page(s):
    2544-2554

    Intrusion Detection Systems (IDS) have been received considerable attention among the network security researchers as one of the most promising countermeasures to defend our crucial computer systems or networks against attackers on the Internet. Over the past few years, many machine learning techniques have been applied to IDSs so as to improve their performance and to construct them with low cost and effort. Especially, unsupervised anomaly detection techniques have a significant advantage in their capability to identify unforeseen attacks, i.e., 0-day attacks, and to build intrusion detection models without any labeled (i.e., pre-classified) training data in an automated manner. In this paper, we conduct a set of experiments to evaluate and analyze performance of the major unsupervised anomaly detection techniques using real traffic data which are obtained at our honeypots deployed inside and outside of the campus network of Kyoto University, and using various evaluation criteria, i.e., performance evaluation by similarity measurements and the size of training data, overall performance, detection ability for unknown attacks, and time complexity. Our experimental results give some practical and useful guidelines to IDS researchers and operators, so that they can acquire insight to apply these techniques to the area of intrusion detection, and devise more effective intrusion detection models.

  • A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques

    Jakkrit TECHO  Cholwich NATTEE  Thanaruk THEERAMUNKONG  

     
    PAPER-Unknown Word Processing

      Vol:
    E92-D No:12
      Page(s):
    2321-2333

    While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.930.50% when the first rank is selected while it gains 97.260.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naive Bayes classifier and the vanilla version. Another result on applying only best features show 93.930.22% and up to 98.85 0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.

  • Intelligent Extraction of a Digital Watermark from a Distorted Image

    Asifullah KHAN  Syed Fahad TAHIR  Tae-Sun CHOI  

     
    LETTER-Application Information Security

      Vol:
    E91-D No:7
      Page(s):
    2072-2075

    We present a novel approach to developing Machine Learning (ML) based decoding models for extracting a watermark in the presence of attacks. Statistical characterization of the components of various frequency bands is exploited to allow blind extraction of the watermark. Experimental results show that the proposed ML based decoding scheme can adapt to suit the watermark application by learning the alterations in the feature space incurred by the attack employed.

  • Sentence Topics Based Knowledge Acquisition for Question Answering

    Hyo-Jung OH  Bo-Hyun YUN  

     
    PAPER-Knowledge Engineering

      Vol:
    E91-D No:4
      Page(s):
    969-975

    This paper presents a knowledge acquisition method using sentence topics for question answering. We define templates for information extraction by the Korean concept network semi-automatically. Moreover, we propose the two-phase information extraction model by the hybrid machine learning such as maximum entropy and conditional random fields. In our experiments, we examined the role of sentence topics in the template-filling task for information extraction. Our experimental result shows the improvement of 18% in F-score and 434% in training speed over the plain CRF-based method for the extraction task. In addition, our result shows the improvement of 8% in F-score for the subsequent QA task.

  • SVM and Collaborative Filtering-Based Prediction of User Preference for Digital Fashion Recommendation Systems

    Hanhoon KANG  Seong Joon YOO  

     
    LETTER-Data Mining

      Vol:
    E90-D No:12
      Page(s):
    2100-2103

    In this paper, we describe a method of applying Collaborative Filtering with a Machine Learning technique to predict users' preferences for clothes on online shopping malls when user history is insufficient. In particular, we experiment with methods of predicting missing values, such as mean value, SVD, and support vector regression, to find the best method and to develop and utilize a unique feature vector model.

  • A Machine Learning Approach for an Indonesian-English Cross Language Question Answering System

    Ayu PURWARIANTI  Masatoshi TSUCHIYA  Seiichi NAKAGAWA  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:11
      Page(s):
    1841-1852

    We have built a CLQA (Cross Language Question Answering) system for a source language with limited data resources (e.g. Indonesian) using a machine learning approach. The CLQA system consists of four modules: question analyzer, keyword translator, passage retriever and answer finder. We used machine learning in two modules, the question classifier (part of the question analyzer) and the answer finder. In the question classifier, we classify the EAT (Expected Answer Type) of a question by using SVM (Support Vector Machine) method. Features for the classification module are basically the output of our shallow question parsing module. To improve the classification score, we use statistical information extracted from our Indonesian corpus. In the answer finder module, using an approach different from the common approach in which answer is located by matching the named entity of the word corpus with the EAT of question, we locate the answer by text chunking the word corpus. The features for the SVM based text chunking process consist of question features, word corpus features and similarity scores between the word corpus and the question keyword. In this way, we eliminate the named entity tagging process for the target document. As for the keyword translator module, we use an Indonesian-English dictionary to translate Indonesian keywords into English. We also use some simple patterns to transform some borrowed English words. The keywords are then combined in boolean queries in order to retrieve relevant passages using IDF scores. We first conducted an experiment using 2,837 questions (about 10% are used as the test data) obtained from 18 Indonesian college students. We next conducted a similar experiment using the NTCIR (NII Test Collection for IR Systems) 2005 CLQA task by translating the English questions into Indonesian. Compared to the Japanese-English and Chinese-English CLQA results in the NTCIR 2005, we found that our system is superior to others except for one system that uses a high data resource employing 3 dictionaries. Further, a rough comparison with two other Indonesian-English CLQA systems revealed that our system achieved higher accuracy score.

  • A Model-Based Learning Process for Modeling Coarticulation of Human Speech

    Jianguo WEI  Xugang LU  Jianwu DANG  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1582-1591

    Machine learning techniques have long been applied in many fields and have gained a lot of success. The purpose of learning processes is generally to obtain a set of parameters based on a given data set by minimizing a certain objective function which can explain the data set in a maximum likelihood or minimum estimation error sense. However, most of the learned parameters are highly data dependent and rarely reflect the true physical mechanism that is involved in the observation data. In order to obtain the inherent knowledge involved in the observed data, it is necessary to combine physical models with learning process rather than only fitting the observations with a black box model. To reveal underlying properties of human speech production, we proposed a learning process based on a physiological articulatory model and a coarticulation model, where both of the models are derived from human mechanisms. A two-layer learning framework was designed to learn the parameters concerned with physiological level using the physiological articulatory model and the parameters in the motor planning level using the coarticulation model. The learning process was carried out on an articulatory database of human speech production. The learned parameters were evaluated by numerical experiments and listening tests. The phonetic targets obtained in the planning stage provided an evidence for understanding the virtual targets of human speech production. As a result, the model based learning process reveals the inherent mechanism of the human speech via the learned parameters with certain physical meaning.

  • Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

    Jong-Hoon OH  Key-Sun CHOI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:7
      Page(s):
    1737-1748

    Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on either a grapheme-based or phoneme-based method. However, transliteration is an orthographical and phonetic converting process. Therefore, both grapheme and phoneme information should be considered in machine transliteration. In this paper, we propose a grapheme and phoneme-based transliteration model and compare it with previous grapheme-based and phoneme-based models using several machine learning techniques. Our method shows about 1378% performance improvement.

  • Improving Keyword Recognition of Spoken Queries by Combining Multiple Speech Recognizer's Outputs for Speech-driven WEB Retrieval Task

    Masahiko MATSUSHITA  Hiromitsu NISHIZAKI  Takehito UTSURO  Seiichi NAKAGAWA  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    472-480

    This paper presents speech-driven Web retrieval models which accept spoken search topics (queries) in the NTCIR-3 Web retrieval task. The major focus of this paper is on improving speech recognition accuracy of spoken queries and then improving retrieval accuracy in speech-driven Web retrieval. We experimentally evaluated the techniques of combining outputs of multiple LVCSR models in recognition of spoken queries. As model combination techniques, we compared the SVM learning technique with conventional voting schemes such as ROVER. In addition, for investigating the effects on the retrieval performance in vocabulary size of the language model, we prepared two kinds of language models: the one's vocabulary size was 20,000, the other's one was 60,000. Then, we evaluated the differences in the recognition rates of the spoken queries and the retrieval performance. We showed that the techniques of multiple LVCSR model combination could achieve improvement both in speech recognition and retrieval accuracies in speech-driven text retrieval. Comparing with the retrieval accuracies when an LM with a 20,000/60,000 vocabulary size is used in an LVCSR system, we found that the larger the vocabulary size is, the better the retrieval accuracy is.

  • User Preference Mining through Hybrid Collaborative Filtering and Content-Based Filtering in Recommendation System

    Kyung-Yong JUNG  Jung-Hyun LEE  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E87-D No:12
      Page(s):
    2781-2790

    The growth of the Internet has resulted in an increasing need for personalized information systems. The paper describes an autonomous agent, the Web Robot Agent or WebBot, which integrates with the web and acts as a personal recommendation system that cooperates with the user in order to identify interesting pages. The Apriori algorithm extracts the characteristics of the web pages in the form of association words that are semantically related and mines a bag of association words. Using hybrid components from collaborative filtering and content-based filtering, this hybrid recommendation system can overcome the shortcomings associated with traditional recommendation systems. In this paper, we present an improved recommendation system, which uses the user preference mining through hybrid 2-way filtering. The proposed method was tested on a database, and its effectiveness compared with existent methods was proven in on-line experiments.

141-160hit(172hit)